Can transcriptome size be estimated from SAGE catalogs?

نویسندگان

  • Michael D. Stern
  • Sergey V. Anisimov
  • Kenneth R. Boheler
چکیده

MOTIVATION SAGE (Serial Analysis of Gene Expression) can be used to estimate the number of unique transcripts in a transcriptome. A simple estimator that corrects for sequencing and sampling errors was applied to a SAGE library (137 832 tags) obtained from mouse embryonic stem cells, and also to Monte Carlo simulated libraries generated using assumed distributions of 'true' expression levels consistent with the data. RESULTS When the corrected data themselves were taken as the underlying model of 'ground truth', the estimator converged to the 'true' value (53 535) only after counting 300 000 simulated tags, more than twice the number in the experiment. The SAGE data could also be well fit by a Monte Carlo model based on a truncated inverse-square distribution of expression levels, with 130 000 'true' transcripts and 10(6) samples needed for convergence. We conclude that the size of a transcriptome is ill-determined from SAGE libraries of even moderately large size. In order to obtain a valid estimate, one must sample a number of tags inversely proportional to the lowest abundance level, which is not known a priori. This constrains the design of SAGE experiments intended to determine biological complexity. AVAILABILITY The 'homemade' software used for this analysis was not designed for general or 'production' use, but the authors will be happy to share Fortran sourcecode with interested parties. CONTACT [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of cutting size and position on propagation ability of Sage (Salvia officinalis L.)

The investigation was conducted on wet and dry seasons of Ethiopia during the year 2012/2013 at Wondo Genet Agricultural Research Center Nursery site. Four levels of cutting size and three levels of cutting positions were arranged in randomized complete block design with three replications. Data on seedling height, branch number/seedling, root number/seedling, root weigh/seedling, root length/s...

متن کامل

Effect of cutting size and position on propagation ability of Sage (Salvia officinalis L.)

The investigation was conducted on wet and dry seasons of Ethiopia during the year 2012/2013 at Wondo Genet Agricultural Research Center Nursery site. Four levels of cutting size and three levels of cutting positions were arranged in randomized complete block design with three replications. Data on seedling height, branch number/seedling, root number/seedling, root weigh/seedling, root length/s...

متن کامل

A human glomerular SAGE transcriptome database

BACKGROUND To facilitate in the identification of gene products important in regulating renal glomerular structure and function, we have produced an annotated transcriptome database for normal human glomeruli using the SAGE approach. DESCRIPTION The database contains 22,907 unique SAGE tag sequences, with a total tag count of 48,905. For each SAGE tag, the ratio of its frequency in glomeruli ...

متن کامل

Transcriptome annotation using tandem SAGE tags

Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites i...

متن کامل

Reproducibility, bioinformatic analysis and power of the SAGE method to evaluate changes in transcriptome

The serial analysis of gene expression (SAGE) method is used to study global gene expression in cells or tissues in various experimental conditions. However, its reproducibility has not yet been definitively assessed. In this study, we have evaluated the reproducibility of the SAGE method and identified the factors that affect it. The determination coefficient (R2 ) for the reproducibility of S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 19 4  شماره 

صفحات  -

تاریخ انتشار 2003